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Abstract: 

We focus on the problem of unsupervised clustering which allows 
automatic setting of the optimal cluster number. We present a 
generalization of the competitive agglomeration clustering 
algorithm firstly introduced in (Frigui and Krishnapuram, 1997). 
This generalization is inspired by the regularization theory and 
suggests a new schema for using various cluster validity criteria 
continuously proposed in the literature. As a consequence of this 
generalization, we introduce new objective clustering functions, 
and present their associated optimal solutions. We present an 
application of this competitive clustering schema to color image 
segmentation in order to perform partial queries in the context of 
image retrieval by content. In this case, each pixel is represented 
by the color distribution in its vicinity. The Clustering algorithm 
has to incorporate an appropriate distance measure to compare 
feature vector similarity 
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Abstract 



In this paper, we focus on the problem of unsupervised 
clustering which allows automatic setting of optimal 
clusters number. We present a generalization of the 
competitive agglomeration clustering algorithm firstly 
introduced in [1). This generalization is inspired by the 
regularization theory and suggests a new schema for 
using various cluster validity criteria continuously 
proposed in the literature. As a consequence of this 
generalization, we introduce new objective clustering 
functions, and present their associated optimal solutions. 

We present an application of this competitive 
clustering schema to color image segmentation in order 
to perform partial queries in the context of image 
retrieval by content. In this case, each pixel is 
represented by the color distribution in its vicinity. 
Clustering algorithm has to incorporate an appropriate 
distance measure to compare feature vectors similarity. 



h Introduction 

Our work was motivated by requirements and 
constraints in the context of image retrieval by content. 
Most systems use the query-by-example approach, 
performing queries such as "show me more images that 
look like this one" Most often, the user is more 
specifically interested in specifying an object (or region) 
and in retrieving more images with similar objects (or 
regions), as opposed to similar images as a whole. Our aim 
is to allow the user to perform a query on image parts 
(region of interest). Methods range from manual region 
delimitation [11], systematic image subdivision without 
segmentation [10] to approximate region segmentation 
[9]. In this paper, we address the problem of clustering 
based segmentation of each image in the database to allow 
partial queries. 

Most popular clustering algorithms have the drawback 
to need a predefined number of clusters. Since image 
databases are often huge, the prior setting of clusters 
number for each image is no longer viable. This context 
requirement motivates our interest to such idea of 
unsupervised clustering. In the first section, we describe 
existing competitive clustering algorithm. We address the 



issue of explanation and generalization in the second 
section. Update equations computation and interpretation 
are given in section 3. We deal with color image 
description and segmentation and show some results in 
section 4. 



2* Initial algorithm 

Competitive agglomeration clustering (CA) schema 
was firstly proposed in [1] to minimize the following 
objective fianction 

C N C[ N '^^ 

1=1 ;=1 '=lL>=^ 
with the constraint: 
C 

^^//y =1 Vl<j<A^and //ye[0,l] 
1=1 

where ( //^y ) is the membership degree of the j-th data 
point Xj in the i-th cluster, dy is their distance, N is the 

total number of gray levels and C the number of clusters to 
be found (which will be dynamically updated). The initial 
CA partition has an over-specified number of clusters, 
which is dynamically reduced as the algorithm progresses. 
At the convergence, the final partition has the optimal 
number of clusters. 

The objective function combines two components. The 
first one is similar to the FCM [3] objective function and 
has a global minimum when each data point is in a 
separate cluster. The global minimum of the second 
component is achieved when all points are in the same 
cluster such that it controls the number of clusters. The 
two components are combined by the parameter a which 
is chosen to equilibrate their contributions: 

tt^ii^'^^iJ^^ 
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The value of a decreases slowly such that it favors 
agglomeration in the first iterations while en^hasidng 
objective function in the latest ones. 

The update equation of memberships is : 



l/di/ 



a(Ni-Nj) 

2 



l/d. 



and can be written as: 



were: 

M ^ 1 /C 1 

are respectively the cardinality of cluster i and a weighted 
average of the cluster cardinality where the weight reflects 

its proximity to the data point Xj . 

^Ij is the membership of the objective function FCM 

[3] while the sign of //,y'^ express the competition 

between clusters and leads to gradual reduction of the 
cardinality of spurious clusters. A given cluster vanishes 
when its cardinality f^, is less than a minimum required. 

On the other hand, another similar unsupervised 
clustering algorithm was proposed separately in the 
bayesian framework based on entropy minimization[2]: 



/=ljeS^L 



Ain p. 



2a^ 
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When OT^ =:0, U is the energy of k-means algorithm. 
We aim to explain the relationship between both 
algorithms in the following section. 



3* Generalization 

Both objective fiinctions aim to minimize a J criterion 
that we write as: 

J = Jl +0/2 

7 is the combination of two terms of antagonist 
behavior: 

" Jj express the fidelity to the data. This term is 
minimum when each data point constitute a 
separate cluster, 



* 7; is a complexity reduction tenn. It is minimal 
when all data points are in the same cluster. 
Uncertainty is irmximal in this case. 
Note that this schema is similar to energy nunimization 
model [13](12] when approximation function is piece wise 
constant. 

Optimal cluster number is determined by a balance 
between these two opposite effects terms. We note that J2 
reduce complexity of data partition in the feature space. 
We would like here to emphasize that cluster validity 
theory has exacdy the same goal. Several criteria are 
continuously proposed in Uie literature to estimate the 
quality of a given partition by reducing partition 
con^)Iexity. Also, we note that die second term used in the 
bayesian framework, consisting of an entropy measure, 
has been akeady proposed by Bezdek [3] as a validity 
criterion. It is clear now that all validity criteria are 
possible instantiations of 32 term. This allow us to explain 
the relationship between both initial algorithms and 
suggest a family of possible other J2 terms. Remember 
that, in the literature, validity criteria are used sequentially 
is as follows: 

■ compute different data partition for c^ 2, . . . cmax 

■ at each convergence, compute the value of validity 
criteria 

■ seek for the extremum value of validity criterion and 
set the optimal number of cluster to its correspondent 
c value. 

With the new proposed schema, validity criterion are taken 
into account "in parallel" i.e. at the same time with 
partition allowing only one convergence process 
computation. 

In the following section we dress the solution for two new 
objective function based on entropy measures. We notice 
that they are not unique and they have the advantage to be 
not numerically complex. 

4. Optimization and interpretation 

We present optimization procedure with two validity 
criteria proposed yet in [3]. They refer to information 
tiieory and have the advantage to be less complex, for 
optimization, than many validity criteria more recentiy 
proposed [S] [6]. Entropy measures have a maximal 
values when partition is completely "fuzz/" and 
membership matrix is constant and identical to 1/c. 
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with V/ 

we obtain^ are Lagrange tnultipHers : 
N (c \ 

i(y)=y+lJl//y-i 

J \' J 



2N 



i 1=1 
Solving this equation for Xj leads to: 



2N 



j=l 



'SJ 



'w2 



Substituting Xj in the exi»ession of uij, we obtain the 
following update equation for the membership of a data 
point xj to cluster j: 



Mij--p' — — 



?/4 



As the initial algorithm, we obtain: 



where: //^ 



Fcm 
V 



and: 



..Biais __L_£_ 
4 2N 



10g/i,y-^10g//,y'jJ 



with: ^l0g;/,y*j = 



s / ""sJ 

that represents the weighted average of log //^y over all 
clusters s. 

The sign of fl^^^^^ allows the competition process, as in 

the initial algorithm, by reinforcement or reduction of 
cardinality cluster. 

In the same way, we present update equation for statistical 
entropy. 

c N c 

J =Y^f^fjdjj-a^PilogPi 

i j i 

with: 

c , N 



£^.. = 1 V; and p,=±^^.= !^ 



1=1 j=l 
Ni is the cardinality of cluster i. In the same way as the 
previous optimization procedure for entropy partition, we 
finally obtain the following membership updating 
equation: 

c 



1/4 ^ a 



2Ndi 



logp,-^ 



C 



s / 



where pi = and p^ = —j^ represent resp. 

probabilty of cluster i and s or relative cardinalties of 
these dusters. 

S(y4) ' 

s 

l;4iogp. 



5. 



with: (logps )j = Pj = 
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Partition entropy has the advantage to be more robust to 
parameter a initialization than the statistical entropy. 
Note that, in both cases, we have the same effect of 

competition described by the sign of //,y given by a 

difference between the current cluster membership 
parameter and a particular weighted average of the other 
clusters memberships where the weight reflects its 
proximity to the current data point. 

5. Application to image segmentation 

We apply the CA schema to color iniage segmentation in 
the context of content based image retrieval. Segmentation 
allows performing a query on a region of interest instead 
of the usual query on the whole image (query by example). 
Indeed, in this context we need an unsupervised clustering 
algorithm since we could not set the number of clusters for 
each image. For each pixel, we consider color distribution 
represented by its local 3D histogram as color feature. 
Then the distance measure for clustering is similar to the 
distance used in [6] and is given by: 
b 

dL,(x,y)^^\h,(x)-h,{y)\ 
r=l 

where b is the number of histogram bins and hf {x) is the 

local 3D color histogram in the vicinity of the pixel x. 
We use gradual focusing decision [4] after partition 
convergence. In the following pictures, we present typical 
segn^ntation results. 




fig. 1 Original color image 




fig. 2 Segmentation result by local 3D color 
histogram competitive clustering 

More segmentation results are shovm on http://w^w- 
rocq.inria.fT/-boujemaa/Partielle2.html . The obtained 
regions will be used as mask query on which Surfimage 
[8] image signatures will be computed to seek for partial 
queries similarity. 




fig. 3 Segmentation result using only color 
components as feature vector 




fig. 4 Original images and its clustering based 
segmentation 
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6. Conclusion 

In this paper, we have address the generalization of 
competitive agglomeration clustering algorithm! 1] that 
explains the relationship with another bayesian clustering 
algorithm [2]. This generalization suggests a new use of 
cluster validity criteria continuously presented in the 
literature. As a consequence, we propose new objective 
functions that are not unique. We present results for color 
image segmentation in the context of image retrieval by 
content. 
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A NEW FUZZY CLUSTERING VAUDITV CRITERION AND ITS APPUCATION TO 
COLOR IMAGE SEGMENTATION 



Xuanli lisa Xie and Geraxdo Beni 
Center for Robotic Systems 
Univenity of California. Sanu Bnbara, CA 93106 



I. latroducUoB 

The engineering literature hM p&id very little attentioo to duncr 
validity iiiues [11, limiting the effort to present new diuiering algorithms 
which pcrfomi reascoahly well on a few dau sets. In panicular, the tssiw of 
validity for dnstering of fuzzy dau leu hat been neglected (with few 
notable cxce^Uoos ( On the other hand, if fozzy duster analyiii is to 
make a significant contributioo to engineering applicatioai, much more 
attention most be paid to fundamental qnestioni of duster tendency. 
Recently, validity of fiizzy dusiering has been discussed in itrplKritifins to 
mixturei of normal distributions [A^], Also applications to distributed 
peroeptioo [6] have been proposed which fdy in an essential way on good 
validity criteria for fnsy dustcring. 

In the latter applications [7]. sepaiaied sensors observe a common 
object. Utey oommtmicate to a central processor oot (perceptual) data 
(which, due to their size, cannoc be timnsmiaed in real time) but decisions 
(which, due to their smaller bh-size, can be transmitted in real time). In 
sodi cases, a fundamental dedsion is often the detenninatioo of the number 
of 'objecu' observed, i.e. the validity of the dustering procedure. Since 
higher levd dedsioos by the central processor are based on the validity of 
these separated dustering procedures, it is essentia] that an evident 
method is devdoped {<x fuzzy duitering validity. 

Generally the issue of duster validity is a broad one and involves 
many questions. In view of the ap(dications to distributed perception, in 
this paper we focus on the validity of a partition. Ibe answer is sought, as 
is generally accepted {1.31. in measures of separetion among dusters and 
cohesion within dusters. 

U. Clustering Algorithm and Validity Criteria 
Oustering is a tO(d that attempu to assess the rdationshtps among 
patterns of the dau set by organizing the patterns into groups or dusters 
such that patterns within a duster are more similar to each other than are 
patterns bdooging to different dusters. 

There are many dusiering algorithms [23.8], but an important 
question in clustering is 'duster validity' which deals with the ■igmficance 
of the structure imposed by a dustering method. Perfonnance of many 
existing clustering algorithms are studied in [9]. We will briefly review 
Fuzzy c-means dustering algorithm below for later referaicc. 

A. Fuxzj c-meani clustering algortthns 

The fuzzy c-means (FCM) dustering algorithm (Bezdek [2)) is the 
fuzzy equivalent of the nearest mean "hard" duitering algorithm (Z>uda and 
Hart [10]), minimizes the following objective function with respect to 
fuzzy membership ^j. and duster centioid V.. 



where 



d'C^Q.VO^CXi-VO^ACXi-Vi) 



(1) 



(2) 



A is a pxp positive definite matrix, p is the dimension of the vectors 

Xj (j = l,2,...,n>, c is the number of dusten. n is the number of vecton (or 

dau poinu). m>l is the fuzziness index [2). 

The FCM algorithm is executed in the fdlowing steps [2]: 

1. Initialize membenhips ^j. of X- bdonging to duster i such that 

" (3) 
2. Compute the fuzzy centroid Vj for i =1,2 c using 



V.>i 

3. Update the fuzzy membenhip using 



(4) 



)tB».l) 



4. Repeat steps (3) and (4) until the value of J^j is no longer decreasing. 

The FCM algorithm always converges to strict kical mimm« of 
suiting from an initial guesa of (ty, but different dtosoes of initial 
might kad to different local minima [4]. This algorithm b move suitahk 
for dustering according to a geometric diitanoc measure, while other 
algorithm, e.g. Maximum -Ukdibood Estimation algorithm [10.11] may 
be more suitaUe in other cases 

B. Validity criteria for hard and fuazy duitcrlBf 

A well esuUished hard duster validity criterion is the scparatioo 
indices Dj (Dunn [13]) which identiries 'compact, separate' (CS) dusten 
and is defmed by 

<li^Ui.Ui) 



Di = min ( min (- 



max ( dia(uk)) 

inese 



dia(Uk)« max d(XJCj) 
dis(Ui.u^ = min d(Xjg 



(6) 



(7) 
(8) 



d is any metric induced by an izmer product on R^. IHe validfty measure of 
CS cluttering of X solves max (max Di), where Ob denotes the 

opiimality candidates at fixed c It ii proved that a hard c-partitioa of X 
containi c compact, separate (CS) dusten if D^>1, Pnitbeiniore, there is 
at most one CS partition of X if D|>1. The main drawbadc with direct 
implementation oi this validity measure is computatunal since f^i«i*i»iin g 
D J becomes oompuutionaDy very expensive as c and n increase. Another 
validity criterion which also roeasures compact and separate dusten is 
introduced by Davies and Bouldin [14]. Hie major difference from DI is 
that it considen the avenge case by using the average error of each class. 
Jain and Moreau [IS] also dcfraed a method for duster validity by using a 
bootstnp technique, that could be used with any dustering algorithm. 

As a fuzzy dustering validity function Bezdek [16] designed the 
partition coefficient F to measure the amount of "overlap" between 
dusten. 



In this form F 



invcnely proportional to the overall average 
overlap between pain of fuzzy subsets. In particular, there is no 
membership sharing between any pain of fuzzy dusten if F=l. Solving 
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max ( max (F) } (c s 2, 3 ^. n-1) is aiwimrd to pzoduoe valid clustering of 

the diu set X. IKtAdvtttUges at the paitaioo ooefftcaeni are the lack of 
direct connection to a geameaical prapeity and iu moootonic decreasing 
tendency with c. Then arc several other criteria in the liiexature which also 
measure the amooni of foxziness, such as dasiificaiion entropy [17], 
proportion exponent [18], unifonn dau functional (19]. non fuxzinest 
index [20] and infonnaiion ratio [21]. Those criteria sliare a similar 
drawback with F. that is the lade of dijcct oonnectioD to the geometrical 
property of dau set. 

Gundexsoa [22] introduced a tepaimtioa coefficient which takes into 
account geometrical properties. 'Diis validity criterion is designed to 
identify compact and s epara t ed clusters (which is similar to our goal). 
However, this method cannot be directly applied. It woiks on fozxy 
clustering oatpuu by fint converting them to hard ones. Since there are 
many ways one can convert fuxzy partitions to hanl ones, this method 
shares the shortcomings of non-uniqueness of tnnsfening from fiizxy 
partitions to haid partitions. 

III. A Compact and Separated Fuxzy Validity Criterion 
b this tectioo we define S as a fuzzy dustering validity function 
which measures the overall avenge compactness and separation of a fuzzy 
c-partiiion. We also give a heuristic rationale and an implementation 
stiategy for the use of this ftmctioo. 

A. DeflBltloa of a new fuzxy dutterlag Talldltj AibcUob S 

Consider a fuzzy c-partitioa of the daU set X « ( Xj; j « 1. 2. ... . n) 

with V| (i B 1, 2, .... c ) the ccntioid d each duster and p^j C « 1. 2, ... cj 

a 1, 2. .... a ) as the fnz^ membenhip of dau point j (also called vector j) 
bdooging to cbus L 

DeflnlUon 1: dy = }ijjllXj - V>l), is caUed the Fuay Distamct of Xj to 
class L 

Note that is the usual Euclidean nonn. Thus dy is just the 
Euclidean distance baweea Xj and V| weighted by the fuzzy membership of 
dau point j h^^gmg to class t . 

Deflnitkm 2: n| ts X| fly is called the Fussy NumUr of vecton m class L 
Note that n^ n a, where n b a lurd' number, e.g. the total number 
of dau points m X. the extreme case, when the partition is hard. n| 
becomes exactly the number of vecton in class L 
Definition 3: for each class i, the summation of the squares of Pozzy 
Distance of eadi dau point, dcaoced by 0|, is catted the VarkU 'um of class 
i.thaiU:Oi«Ij(dy)2-(dii)» + (di2)*+... + (di^. ■niesamm.iiooof 
the variations of all dasscs. denoted by O, is called the Totai yariatiom of 
dau set X with respect to the fuzzy c-paniiion, i.e., O > £| Oj « L{ Ij 

Note that and O depend on the dau set. but room importantly they 
depend on the fiixzy c-partition, Le.. ^y*s and Vj*s. A better c-puiition 
should resuh in smaller o. These values are not normalized, and they 
depend on how we choose our ooordinaie system. For example, if the fuzzy 
c-partition is obiuned by using the fuzzy cmeans algorithm with m s 2. 
the value d a win be equal to the c-means objective function J2 (Sec 
n.A.l). 

DeflnitloB 4: the ntio, denoted by K, of the toul variation to the size of 
the dau set, that is, ff = (oAi), is cafled the Compacttust of the fuzzy c- 
partition of the dau set. 

The value it measures how compact each and eveiy dus is. The more 
csompact the classes are. the smaller x is. x is a function of the distribution 
cfaarectcristics of the dau set itself, and more importantly a function of 
how we divide the daU points into dusters. Bui it is independent of the 
number of daU poinu. For a given data set, a smaller it indicates that we 
have reached a partition with more compact duiters, thus indicating a 
better partiUoo. Gaih and Geva [4] introduced fuzzy hypeivolume which is 
the probability weighted total variation. This validity measure can identify 
ellipsoidal dusters and oveilapped dusten. By inoofponling covarianoe 
into the distance matrix A m equation (2X n can also identify HlipM^irfBl 
dusten. 



Definition 5: the quanthy = (o/nj) is called the Compact/usx of class 

I 

Since n^ is the number of vectors m dass i, c^/n^ is the average 
variation in dass i. We have deftned the compactness of fuzzy c-paniiion 
in tenns of total variatioa and number of vecton. After defming n^, we 
have some alternative ways to define the oompacmess of the fuzzy c- 
partition. such u: JC = (£| jtjVc Lc, the average compactness of each 
class; or X 3 max nj. i.e., the wont case. It can be shown that both wayi 
have similar effect to definition 4. 

Definition 6; s s {d^^^i^ is called the SeparatioH of the fuzzy c- 
partition, where d^ is the minimum distance between duster centroids. 
i.e.: 

<W=min«V,-Vill 

A laiger s indicates that all the dusten are separated. 
DcflnltloQ 7: the Compactness ami Separatum Validity FiuKtim S is 
defined as the ratio of compactness x to the separation s. ix. S = ic/s. 

After substiniiing for 11 ends, we get S = (aM)/(dnjiJ?. A smaller S 
indicates a partition in which all the dusten are overall compact, and 
separate with each other. Huu, our goal ii to find the fuzzy c-partition with 
the smallest S. 

S can be explidily written as: 

zirfwVi-Xji)* 

g _ M 

n minllVi-VjlP 

y (10) 
We note that the definitioa of S is independent of the algorithm used 
to obtain n^. Thus it is not intcmal 10 the dustering algorithm. For FCM 
algorithm wuh ma2. S can be shown to be: 

n (d-fa)* , (11) 

which is very easy to calculate. More imponantly. minimizing S 
ooiresponds to minimizing Jj^ which is die goal of FCM. The *^i*ifmn1 

factor in S it (d^fj^l^, which is the separation measuicmeot. The more 
leparate the chisten, the laiger (l^ijl^* and the smaller S. Thus, the 
mallest S indeed indicates a valid optimal partition. 

We note, however, that S is still moootonically decreasing when c 
geu vciy large and dose to n (under certain statistical assonqittoos). One 
thing we can do is to impose ta ad koc punishing fimctiofi [23] to 
elimtnate this decreasing tendency. How to choose this function is not 
discussed here. Nevenhdess. we shall see that even withooi a punishmg 
fbnaion tl» validity function S provides a weD defined nwdiod to solve the 
validity problem. 

There are some existing validity criteria in the literature which 
measure compact and s^arau dustering. The separation ooeffidoit m [22] 
considen the worst case while S is iikore interested in total average cue. 
Furthermore, the sr p a ratt o n ooc&icient caimot be direcdy applied to fuzzy 
dustering as mentioned before, h [14], Davies and Bonldin introduces a 
hard partition validity criterion R. It is roughly related to S by R=S^ if S is 
used for hard partitions. However, from our experience, S/c has a 
decreasing tendency as c increase. 

B. Case of validity function S < 1/8 

As discussed previously, an identifiable substmcture resulu in a mull 
S, but it is not immediately obvious how snuO S u. On the other hand, it 
is pouible to fmd a heuristic threshold Sq >uch that if S < Sq the fuzzy 
partition is overall oompaa and separate. 

Consider two circular dusten with radius R and uniform vector 
density p. If the c-partition is separated, (hen 

2 . (12) 
where d^j^ is as above. Let AA be a small area inside a duster and r be 
the distance from AA to Vj, the duster oemroid. Thus the variation Oj of 
the duster i can be written as: 
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nombcr of veccon ia cbe dnster i is « sR^p. Let ui ntppoie thai (be 
data set is oacnposed of classes of equal variance , then the total vuiatioa 
o can be wristen w a« co^ and the coul mmber of vecton n s 04. Thus 
ifae validity faaaioB. can be wriaen as 



If the ctuiten are sqwrnr, aooordiag to gqustion (t2X 

S < 1 •^((W* « ^ . 

2*(<W* 4 8 

Thos we may oooskler the c-pntitkm as oocBpMt and scparaiB if S < 
1/8. We have assnncd thu the data is iraifoimly diftriboied in obtaining 
the value S^si/g. However, die actsal dau set will dm always be of this 
kind. Thus we should not assitaute too nmcfa significance to the actual value 
So=l/8. If we use a man itrict deosity fiinction or chooae the density 
function with pre-knowlcdfc about the dau » the vafaae of Sg will get 
smaller. This vahie just gives os a goiddiae and S < 1/8 is not a neoessaiy 
nor a lofTtcient condition Neveitheless. from this resuU we are led to 
believe that then axisu a oenatn vahia soch that S less than this vatoe 
conespoods to a imiqae panttian and, akove imponanily. that die fuziy c- 
panitioo f or S less than this value is a gkibal optimal paotitioa (but it does 
net mean that S > 1/8 implies dnsten that are not coinpaa and sepaiaie). 
We shall see diat ihis u KinaUy the case in Section IV.A. 

C. Mlnlmltatloa of S 

Since smaller S means a moi« *****r*r* and sqiaitte c-pamtiOQ« we 
assome that the mbijynnHi S r^*iftn >a Ibe most valid. Thus, a bearistic 
strategy to ase S as a validity function is as foikrwa. Using any foxzy 
clustering algorithm, find one or more optimal C'paititioiis of the data set 
X for each c = 2, 3, a-1. Let denote the optimality candidates at 



each c, thn the sofaitian of min ( min S ) is 
valid fiu^ chisiering of the dau set X. 



to yidd the most 



D. Impltmaautlon strategy 

Once we have defuied the valtdtty function S, our impkmenUng 
strategy can be snmmariied imo the foUowbig pseudo^orithm: 

1. initialiyr. c <- 2, S* <- ••, c* <- 1; 

2. iniiialiTr fuxiy membenhip 

3. use any stable foiqr ctustering algorithm to 
update centroidsVi and 

4. do oonveigeBoe test; if negative goto 3; 

5. cnmpmr foictian S; 
6.ifS<S»,S» <-S,c»<-c; 

7. if optimal candidate not found, goto 2; 

8. c C4-1. if CB ctop-vabe, stopc 

9. goto 2; 

Steps 2-4 are the ftuiy o-paithian algorithm. For FCM, equations (4) 
and (5) can be used The oonveigcnoe test can be (Tgi )q+i 'Om \ * 
epsaoo (e.g. 0.001X where q is an iteration index and J^, is as in equation 
{\\ With m = 2, the S can be easily calculated as in equation (11). 

In step 2» the initial values of )ty can be ■««*gf«^ randomly and then 
nomialized to satisfy £j s 1 f or all j. Another way to initialize for c 
> 2 is as foUowB. When the c partition is increased to c<fl. the goeu of 
initial membenhip for the canent c-i-1 partition can be obuined by 
assigning the fuuiest vector in previous c partition to be the most 
detenninam vector m the current putition. This method can be described u 
follows: 

Mil(°0 fori=l c; 

where 

k = argmin|(M^|jJ j»l n. 

Hdj = °«i(Hj>-™iO*yl- 



Nou that this second method is just another heuristic way for 
initialiratifw In many cases, it proves to lead to faster oooveigeoce. We 
may tiy to mix these two methods together to take advamagc of their 
merits. 

A problem of intplcmentation is that S will have a tendency to 
eventually decrease when c is large. So the vafaie of S is meaningless when 
c geu dose to n. Actually, this is not a serious problem since this 
phenomenon will not appear in a quite large range of c and since the 
number of dusters in the dustering proUem usually is much smaller thin 
the number of dau poiitts. Thus we can use the foUowing three heoriiiic 
methods 10 determine the slofHvalue of step 8. 

Fint u mentioned in Section ULA we can use a punishing function 
which imposes on S to counter this decreasmg tendency. In Dunn [23], 
the Wmali ration and standardizatioo of a validity fonctioo* is a simple 
example of the idea of jmiAmg ftmctixa. The second method is that of 
plotting the optimd valne of S for c ^ to a-1. dien ffk c*fng the starting 
point of moootonically decreasing tendency as the maximum c to be 
considered. Let Cg^^^ denote such a c, then we find c by solving 

* ^ * '"nic ihiid way is application dependem. For most 

a p pl irai i nns we do not need to compou S for very taige c. It is almost 
dways the case that c at the stop-value is « n. In this m«tMti#^ we can 
either choose the m ai tm n m c acoofdiag to pre-knowledge or eg. kt c„^^ 
■ n/3 which very likdy would not reach the starting point of the 
decreasing tendency. 

IV. MathciaaUcsJ Jutlflcatlou 
We have already defined the new validity function and givai an 
in^emenutioD strategy to use this function. In this section, we will 
mathematically justify this new fnay validity ftmctkn vb itt lelattoothip 
to a well established hard paititioa validity measure, and also give 
fmmerical fmamplrs with die comparison resuhs in section V. 

A. Ualqaanaaa and global optlaallty of the c-partltloo 

The sepantion index D| (pfopoaed by Dunn (13)) is a hard c-paititicn 
dustering validity cviterioiL irD|>l, unique ooopaa and separated hard 
dusters have been found. This resuh turns out to be useful also for fuzzy 
dustering validity. In fact we may expect ^ if the daU set X rwdly has 
distinct substntcure, te. hard dusurs, a ftuzy partitioning algorithm 
should pfodoce relatively hard membershipe |iy and small total variaiicns. 
We can prove diet if the optimal aotntion D| becomes sufficiently large, 
the optimd validity functkm S wiD be very smaU, which means that a 
unique c-paitition has been found. The proof of diis is as follows. 
Dafhiltton S: Let C* » 1. c, j - 1. o) be the membership of any 
fiizzy c-partilion. The correipondutf itard c-partisum. of (ijj is defined as 

«ij: 

for j = 1,2,..., n: 

a^»l ifioargmaxjiny); 

0)^ sO otherwise. 
Theorem 1: For any c o 2, n-1, let S be the o^rall compact and 
separated validity function of any fuzzy paitiiino, and D| be the separation 
index of the correspooding hard parthinn, then we have: 

(Di)* . 

Proof: Let the fuzzy c-partitioa be an optimal partition of the dau set X= ( 
Xj; j « \X."A \ with ( i =l,2,...c) the centroids of each class u^. and 
^jj the fiizxy roembeiship of the dau poinu X^ belonging to cUss uj. The 
total variation a^p, of the optimd fuzzy c-panition ii defined in 
defmittoo 3. Thus the totd variation Ojj of the corresponding hard c- 
partition is 

From the defuuttons of and above , we can get: 
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Si^ipose thit the centioid V. is inside the botrndaiy of cluster t for i & 1 to 
cThcn 

fbrXj € . where diA(D^) is defmed tn eqoalioi (J) . We thus have 
1 

!S ninax{dia\ui)). 
We also have thai (dfojo]? 2 min (dif^o^^)}, where dis<t^^j) was defined 
in equation (8), thus 



max {dia^iii)) 



min (dis'tUiiij)} 



n*(dBb)* 
Using equation (6) and (lOX we get: 
S i -L^ 

Evidently, S becomes arbitrarily unall as Dj grows without bound. 
Ai mentioned, it has been proved by Dunn [13J that if Dj > 1 the hard c- 
paitiiicn Is unique. Thus, if the dau set has a distinct substiuctnic and the 
fuzzy paitition algorithm has found it, then the oonctponding S < 1. 

Ihis is consistent with equaiton (13). But a more interesting question 
is whether there exists a vahie such that S •mmiHf* than this vmhie indicates 
the ftTi ft f n cc of t distinct subatmctnrB and the dtsooveiy d a unique fiizzy 
c-paitition7 From the above discuuioa, this seems to be very likely, and 
indeed we show below (theoacm 2) that this is the case. Let us define some 
terms first 

DcflnltloD 9: Let fu be any local optimal fnz:^ c-partitiGo, hu being the 
corresponding hard c-putitioo. Since S £ l/(D|]r. there extsu a value Bf^ 
(df„ £ 1) such that S=df^/(Di)^. Hiis df^ is called ihc Jiaxy fuMCtionai 
cotjficUnt, 

Qearly, eech fux^ partition correspoodi to one df^ We denote the 

OfuJ* 



minimoro functional ooefGcient as d^j^. that is^dg^j^ 
Thus, for an fu we have: 

3Un^^- 0^) 
Theorem 2: Suppose the fiizzy clustering algorithm is stable, then for 
every c, 2£c<n, whenever S < the corresponding fuzzy partjiiao is 

unique. Fuithennore. this partition is globally opumaL 
Proof: let fu be any local optimal fuzzy c^partilion with function S and hu 
the oorrespoodittg hard c-partitioo with separation index Dp According to 

iheoicm 1 we have S £ 1/(D])^. And there is a such that 

Thus, by equatujD (14), 

S^^/CDl)^. 



From the above we can exp^dtly see that if S < d„ 



Dj > 1. Since 



it has been proven that there ii a unique hard c-paitition if Dj > I, we can 
uy that if the fuzzy c-partitton validity function Sf^ < d^j^ the hard c- 
partiiioo hu corresponding to fu is unique, that is, 
Sfu < 9niin unique hard c-partilion hu. 

Now we prove that S < dgjjg corresponds to a unique fuzzy c- 
partition. In fact, suppose it does noc. then there exist at least two different 
local optimal fazzy c-partitiorks fu, fv and the corresponding hard c- 
paititions, ho, hv such that S^ < , Sy < , that is. 

Su<9min 

Sv<3min =>hv- 
But wc have shown that if S <^^'^* ihe corresponding hard c-panition is 
unique, therefwe hu = hv. Since by hypothesis the fuzzy c*partiiion 
algorithm we used ii omvergent, lot the same initial condition, it will 
converge to the same local optimal solution. Let us choose the hard 
partition results of hu and hv respectively as the initial values of the fiizzy 
c-panition algorithm. Since hu » hv, thus fii s fv. This means that the 



fuzzy c-partittons fii and fv must be the same partitions if S<djQ^. aul S„ 
must be the same as Sy. Thus there extsu at most one fii^ c^iartition 
toA that S^^Iq and this indicates its imiqnenesi and global optimalily. 
The above proof shows that d-j 



unfortunately we cannot easily 
known. 



I very imponant. But 
calculate it, although iu existence is 



V. EmbpIcs 

A. Validity function S and separation Index D| 

Fig. 1 dcpicu a set X of n t^l\ two-dimensional dau poinu arranged 
in c B 3 visually ap p aren t dusters. The fixz^ c-means algorithm has been 
carried out using Euclidean norm« msZO, epsilon b 0.005 for cs 23,4,3, 
and 6. The nearest hard c-panition obuined from X in the sense of 
maximum membenhip it ohrainfd using definiiioo 8. Table 1 compares 
various lesohs frem Bezdek [2] with oar ealriilatirm of S. 

From table 1. we see that the hard 3-partitiao of X displayed in 
column S is the unique CS hard dusiering of X since D]>1. h\ other woids, 
the three visually apparent chistcn of fig 1 are unique CS dusters. We see 
that the hard partitions in table 1 snggeited by the fiizzy c-means for c^ 
are not of this type. Observe also that F, H. and S identify c* s3 quite 
strongly, but not P which indicates c* ^ Bmheimore, since S SI/ (Dj)^ 
there is a d sudi that S s 9 * / (D|)^. We use the vahies of S and D| from 
table 1 to ralniliTn 3 to see how the S and 3 relate to each other. The values 
of D| , S and d are listed fai tabic 2. 

To sum up, we have seen ( table 1. column 5 ) that there eatist a 
unique hard c-paititioa when c* c3. We also see (tabk 2) that there eaiists a 
^min ^ ^ ^min int^catei a unique fuzzy c-partition. that is a 
global optimal solution. In faa from table 2, S< d appears for c* only, 
that is, S s 0.03 < 0.09. This is consistent with theorem 2 and quite 
I^usible since if the data really have a distiiKt snbstrtkcinre, the fbz^ 
algorithm should yidd relativdy hard results loo. Urns we can say that c* 
s 3 is a unique fuzzy c-partitioo identified by S u rigocously as c*b3 is a 
unique hasd o-paitition tdentified by D|.We also give an *»*w»pi«' to 
1 die lesolts obuined Cram F«kIS(2^. 
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Table 1 












numof 


partition 


partition proportion 


CS 


CS 


CWS 


dusten 


coefficient 


emrc^ expcment 


function 


index 


index 


C 


F 


If 


S 


|>1 




2 


0.85 


0.23 62.3 


0.06 


0.05 


0.03 


3 


0.97* 


0.09* 179.4 


0.02* 


2.17* 


2.17* 


4 


0.85 


0.26 225.0 


1.08 


0.33 


0.33 


5 


0.75 


0.4 3 207.0 


0.21 


0.33 


0.33 


6 


0.72 


0.50 278.9* 


0.45 


0.33 


0.31 



Notes: * for defmition refer to [2]; refer to [18]; 
* indicates the lest* partition. 
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Tahlc2 



num d 
cluiten 

C 


cs 

function 
S 


CS 

bdex 

Dl 


d 


> 


2 0.06 


0.05 


0.0002 


> 




3 0.02* 


2.17* 


0.09 


< 




4 1.08 


0.33 


0.10 


> 




S 0.21 


0.33 


0.02 


> 




6 0.45 


0.33 


0.05 


> 





Notes: * tndicmtei the ^n' putdion. 

B. ApplIcAtloa To Computer Color Vision 

Quiier tnalyiis hu been playing an imponant role tit solvuig many 
problems in pattern lecognitkn and image proceuing. For example, it is 
used for feature srJedio n m Jain md Dobes and for fanage legmentaiioo for 
range image in Hof^an and Jain [26]. Image tegmentaiion is a very 
critical step in image processing because erron at this lUge influence 
feature extrution. dassification, and inteipretation at later sUges. 

In this seaun. we describe an ^iplication of our clustering criterkn 
to color image segmenution for recognition of defecu in integrated 
circuit (IC) wafers. The features of IC wafers are inherently cdocful because 
of the interference effecu takii^ place on the thin films which nuke up the 
IC stnictares [27]. Certain cUsses of IC defecu can be detected by the ose 
<rf colon which are otherwise not possible to detect in grey-scale image 
procesiing [27], Various IC patterns manifest different orfors due to the 
varying thicknesses in their structure. 

In particular, we are interested in color ring dcfea recognition, A 
color ring defect is formed by a panicle on the IC wafer causing a 
nonunifonn thin film thickness surroanding the paitide. The interference 
of different light wave lengths forms several co-centered color rings. The 
maxi mum number a( tings among the cotoit reflects the size of the dffrti 
Our task is to segment the color ring defect image and find the number of 
colon in (he image and number of rings stitiogly formed in eadi color. 

For the color ring defect problem, Barth [27] used a clustering method 
together with an appropriate distance threshold to successfully deuct some 
defecu. However, the performance of the method depends vciy much on the 
choice of the distance ihieshold. An improper choice of the threshold may 
lead to erroneous partitions. Furthermore, the right choice of such a 
threshold is unknown a priori. Another concem is the potential of the 
method to give a very large number of paititioos which may not be very 
useful for solving problems such as the "ring" defect problem. 

Here, we describe the use of our clustering validity criterion to the 
segmentation of two color ring defect images taken from real samples. The 
image is of 512x480 pixels. Since each defect occupies a very small pan of 
the image, a focustng_of_auentioa strategy [27] is employed, that is we 
only segment a smaU part of the inuge which contains one color ring 
defect In such a way, the size of data to be processed can be largely reduced 
and computing time saved. Notice that there are some noises in both 
images. To reduce the noise effett. a threshold for pixel density in color 
space is used. Only those colon with density larger than the ihreahold are 
processed by using the chistering algorithm. The choice of this threshold 
does not essentially affecu the resulu. The remaining dau poinu are 
assigned to the nearest cluster. After segmenution. all pixels in eadi 
chiftcr are assigned the color value cf the cemroid of that cluster. 

Fig, 2 is the picuire of a color ring image which contains two color 
ring defeas. We focus our attention on the upper left color ring inside the 
«hown in Fig.2. There are two main visually apparent colon 
("red", "green") in the window. This pan of image needs to be segmented 
in order to fmd the number of colon existing m this defea and number of 
color rings formed hi each color. The problem ii to find the number of 
clusten in the color space that best partitions or segmenu the image. 
From the image inside the window. 290 distina dau potnu in color space 
are obtained by using a pixel density threshold of 4. 

The partitioning result is shown in Fig .3. Our validity criterion 
identifies the 2-partiiion u the best solution, thus correctly reflecting the 
two visually apparent colon in the image. Fig. 4 (a-d) shorn the resulu of 
2 to 6 partitions with each segmented color displayed in a separate 
window. In Fig. 4a. the lower left wimtow shows the segmented image for 



the window above it. and the colon in tlw ligtt two windows are *ied" and 
'green", respectively, which are the actual apparent color atmcoire in the 
image. Notice thai edges separating the two colon are very dear and die 
rings formed in each color is quite stroQg. There are three dear rings in 
the red color window and three dear rings in (he green ccior wtndcyw. 
Thus, the maximtnn number of ringi among the two colon is three and the 
total number of rings in this defect is six. 

The second best sdmioo (see Fig J) is c-3. Eseh eegstssied cdcr for 
3 -partitions is shown in a separate window in Flg.4b. A thtnl color (daik 
brown) is split frixn the 2-psrtTtions. However, one can tee the «i««t»^ 
of noises. The number of rings in this color is not dear at all and the edges 
are not smooch either. Similarly (or even wone). aD the other c-paxtitions 
with o3 produce irregular trgmmtarinni or weaker imga. 




Fig.2 Cokr ting image for cataofle B 
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Fig.4a 2-panitions 
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Fig.4b 3-pftititlofu 




FiS.4G 4-putitkni 




Fii.4d S^iutitUiiu 



Vi:.. CoDcloilon 
In rammtry. the main result of the ptper u the iatiDdnaiGo of a new 
nruegy for the detennuutkn of the degree of validity of a fiuzy puiition. 
The strategy hai been matbenutically jiutifted via iu nlation to hud 
paititioo validity mearare«. We have derived the leUtiooUiip (theorem 1) 
between thii fuzzy validity ftmctioa and the moit general, and well defined, 
hud chif tering validity ftmctioa ( 'acpafitioo index' of Dunn for which the 
cottditiOQ of uniqoeneu has already been established). By using this 
relationship, we have also proven (theorem 2) the existence of a unique 
fuzzy c-paniticn produced fay the faizzy validity function. Examples of 
applicationt to segmenution of color image for IC defecu give 
encoQimging resuha. The main advantage of the new strategy to detennine 
validity is its coroputability which allows applications to 'real-time' 
engineering systems* such as color viti<m system, robotic syitems, and 
distributed perception networks. These applications are cuirently under 
invenigaiioo [6]. 
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